Simplified HDFS Architecture with Blockchain Distribution of Metadata

نویسنده

  • Deepa S. Kumar
چکیده

Big data storage becomes one of the great challenges due to the rapid growth of huge volume, variety, velocity and veracity of data from various sources like social sites, Internet of Things, mobile users and others. These data cannot be processed by the traditional database systems. Hadoop is a distributed and massively parallel processing system for big data whereby the storage is based on the distributed file system called HadoopDistributed File System (HDFS). HDFS is organized as a collection of Data Nodes which stores data as blocks and the information regarding the data is kept as metadata which is stored in an expensive, reliable hardware known as Name Node, which serves as a master server. The existing HDFS architecture suffers from single point of failure due to the existence of single Name Node, where the metadata creation is being done. Metadata is the whole information regarding the distribution of data blocks, replica management and block size information.This paper proposed a simplified architecture for big data storage which eliminates the concept of master node called Name Node with the functionalities of the Name Node being distributed using blockchain technology. Metadata creation and blockchain placement was implemented and tested in a cluster of nodes. The code for metadata creation in python runs successfully for the proposed architecture. Theimplementation of blockchain in the proposed methodology results in low metadata access delay and thereby improves the execution time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ sing...

متن کامل

Distributed Metadata Management Scheme in HDFS

A Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably and to stream those data sets at high bandwidth to user applications. Metadata management is critical to distributed file system. In HDFS architecture, a single master server manages all metadata, while a number of data servers store file data. This architecture can’t meet the exponentially increased stor...

متن کامل

Scaling HDFS with a Strongly Consistent Relational Model for Metadata

The Hadoop Distributed File System (HDFS) scales to store tens of petabytes of data despite the fact that the entire le system's metadata must t on the heap of a single Java virtual machine. The size of HDFS' metadata is limited to under 100 GB in production, as garbage collection events in bigger clusters result in heartbeats timing out to the metadata server (NameNode). In this paper, we addr...

متن کامل

Snapshots in Hadoop Distributed File System

The ability to take snapshots is an essential functionality of any file system, as snapshots enable system administrators to perform data backup and recovery in case of failure. We present a low-overhead snapshot solution for HDFS, a popular distributed file system for large clusters of commodity servers. Our solution obviates the need for complex distributed snapshot algorithms, by taking adva...

متن کامل

Optimistic Concurrency Control in a Distributed NameNode Architecture for Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is the storage layer for Apache Hadoop ecosystem, persisting large data sets across multiple machines. However, the overall storage capacity is limited since the metadata is stored in-memory on a single server, called the NameNode. The heap size of the NameNode restricts the number of data files and addressable blocks persisted in the file system. The H...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017